Overview

Dataset statistics

Number of variables10
Number of observations683
Missing cells0
Missing cells (%)0.0%
Duplicate rows46
Duplicate rows (%)6.7%
Total size in memory53.5 KiB
Average record size in memory80.2 B

Variable types

Numeric9
Categorical1

Alerts

Dataset has 46 (6.7%) duplicate rowsDuplicates
Clump Thickness is highly correlated with Uniformity of Cell Size and 7 other fieldsHigh correlation
Uniformity of Cell Size is highly correlated with Clump Thickness and 8 other fieldsHigh correlation
Uniformity of Cell Shape is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Marginal Adhesion is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Single Epithelial Cell Size is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Bare Nuclei is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Bland Chromatin is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Normal Nucleoli is highly correlated with Clump Thickness and 8 other fieldsHigh correlation
Mitoses is highly correlated with Uniformity of Cell Size and 2 other fieldsHigh correlation
Class is highly correlated with Clump Thickness and 8 other fieldsHigh correlation
Clump Thickness is highly correlated with Uniformity of Cell Size and 6 other fieldsHigh correlation
Uniformity of Cell Size is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Uniformity of Cell Shape is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Marginal Adhesion is highly correlated with Uniformity of Cell Size and 6 other fieldsHigh correlation
Single Epithelial Cell Size is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Bare Nuclei is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Bland Chromatin is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Normal Nucleoli is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Class is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Clump Thickness is highly correlated with Uniformity of Cell Size and 2 other fieldsHigh correlation
Uniformity of Cell Size is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Uniformity of Cell Shape is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Marginal Adhesion is highly correlated with Uniformity of Cell Size and 6 other fieldsHigh correlation
Single Epithelial Cell Size is highly correlated with Uniformity of Cell Size and 6 other fieldsHigh correlation
Bare Nuclei is highly correlated with Uniformity of Cell Size and 6 other fieldsHigh correlation
Bland Chromatin is highly correlated with Uniformity of Cell Size and 6 other fieldsHigh correlation
Normal Nucleoli is highly correlated with Uniformity of Cell Size and 6 other fieldsHigh correlation
Mitoses is highly correlated with ClassHigh correlation
Class is highly correlated with Clump Thickness and 8 other fieldsHigh correlation
Clump Thickness is highly correlated with Uniformity of Cell Size and 7 other fieldsHigh correlation
Uniformity of Cell Size is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Uniformity of Cell Shape is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Marginal Adhesion is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Single Epithelial Cell Size is highly correlated with Clump Thickness and 8 other fieldsHigh correlation
Bare Nuclei is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Bland Chromatin is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Normal Nucleoli is highly correlated with Clump Thickness and 7 other fieldsHigh correlation
Mitoses is highly correlated with Single Epithelial Cell Size and 1 other fieldsHigh correlation
Class is highly correlated with Clump Thickness and 8 other fieldsHigh correlation

Reproduction

Analysis started2022-10-10 16:52:57.545908
Analysis finished2022-10-10 16:53:03.798586
Duration6.25 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Clump Thickness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.442166911
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:03.825610image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.820761319
Coefficient of variation (CV)0.6349966977
Kurtosis-0.6331245309
Mean4.442166911
Median Absolute Deviation (MAD)2
Skewness0.5876542361
Sum3034
Variance7.956694418
MonotonicityNot monotonic
2022-10-10T22:23:03.881662image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1139
20.4%
5128
18.7%
3104
15.2%
479
11.6%
1069
10.1%
250
 
7.3%
844
 
6.4%
633
 
4.8%
723
 
3.4%
914
 
2.0%
ValueCountFrequency (%)
1139
20.4%
250
 
7.3%
3104
15.2%
479
11.6%
5128
18.7%
633
 
4.8%
723
 
3.4%
844
 
6.4%
914
 
2.0%
1069
10.1%
ValueCountFrequency (%)
1069
10.1%
914
 
2.0%
844
 
6.4%
723
 
3.4%
633
 
4.8%
5128
18.7%
479
11.6%
3104
15.2%
250
 
7.3%
1139
20.4%

Uniformity of Cell Size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.150805271
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:03.940715image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.065144856
Coefficient of variation (CV)0.9728131675
Kurtosis0.0736791399
Mean3.150805271
Median Absolute Deviation (MAD)0
Skewness1.226404096
Sum2152
Variance9.395112987
MonotonicityNot monotonic
2022-10-10T22:23:03.993763image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1373
54.6%
1067
 
9.8%
352
 
7.6%
245
 
6.6%
438
 
5.6%
530
 
4.4%
828
 
4.1%
625
 
3.7%
719
 
2.8%
96
 
0.9%
ValueCountFrequency (%)
1373
54.6%
245
 
6.6%
352
 
7.6%
438
 
5.6%
530
 
4.4%
625
 
3.7%
719
 
2.8%
828
 
4.1%
96
 
0.9%
1067
 
9.8%
ValueCountFrequency (%)
1067
 
9.8%
96
 
0.9%
828
 
4.1%
719
 
2.8%
625
 
3.7%
530
 
4.4%
438
 
5.6%
352
 
7.6%
245
 
6.6%
1373
54.6%

Uniformity of Cell Shape
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.21522694
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:04.052817image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.988580818
Coefficient of variation (CV)0.929508515
Kurtosis-0.01681562061
Mean3.21522694
Median Absolute Deviation (MAD)0
Skewness1.157890012
Sum2196
Variance8.931615308
MonotonicityNot monotonic
2022-10-10T22:23:04.106867image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1346
50.7%
1058
 
8.5%
258
 
8.5%
353
 
7.8%
443
 
6.3%
532
 
4.7%
730
 
4.4%
629
 
4.2%
827
 
4.0%
97
 
1.0%
ValueCountFrequency (%)
1346
50.7%
258
 
8.5%
353
 
7.8%
443
 
6.3%
532
 
4.7%
629
 
4.2%
730
 
4.4%
827
 
4.0%
97
 
1.0%
1058
 
8.5%
ValueCountFrequency (%)
1058
 
8.5%
97
 
1.0%
827
 
4.0%
730
 
4.4%
629
 
4.2%
532
 
4.7%
443
 
6.3%
353
 
7.8%
258
 
8.5%
1346
50.7%

Marginal Adhesion
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.830161054
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:04.164919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.86456219
Coefficient of variation (CV)1.012155187
Kurtosis0.9424072094
Mean2.830161054
Median Absolute Deviation (MAD)0
Skewness1.509181064
Sum1933
Variance8.205716543
MonotonicityNot monotonic
2022-10-10T22:23:04.223973image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1393
57.5%
358
 
8.5%
258
 
8.5%
1055
 
8.1%
433
 
4.8%
825
 
3.7%
523
 
3.4%
621
 
3.1%
713
 
1.9%
94
 
0.6%
ValueCountFrequency (%)
1393
57.5%
258
 
8.5%
358
 
8.5%
433
 
4.8%
523
 
3.4%
621
 
3.1%
713
 
1.9%
825
 
3.7%
94
 
0.6%
1055
 
8.1%
ValueCountFrequency (%)
1055
 
8.1%
94
 
0.6%
825
 
3.7%
713
 
1.9%
621
 
3.1%
523
 
3.4%
433
 
4.8%
358
 
8.5%
258
 
8.5%
1393
57.5%

Single Epithelial Cell Size
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.234260615
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:04.283027image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.223085456
Coefficient of variation (CV)0.6873550777
Kurtosis2.129639279
Mean3.234260615
Median Absolute Deviation (MAD)0
Skewness1.703716401
Sum2209
Variance4.942108947
MonotonicityNot monotonic
2022-10-10T22:23:04.335075image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2376
55.1%
371
 
10.4%
448
 
7.0%
144
 
6.4%
640
 
5.9%
539
 
5.7%
1031
 
4.5%
821
 
3.1%
711
 
1.6%
92
 
0.3%
ValueCountFrequency (%)
144
 
6.4%
2376
55.1%
371
 
10.4%
448
 
7.0%
539
 
5.7%
640
 
5.9%
711
 
1.6%
821
 
3.1%
92
 
0.3%
1031
 
4.5%
ValueCountFrequency (%)
1031
 
4.5%
92
 
0.3%
821
 
3.1%
711
 
1.6%
640
 
5.9%
539
 
5.7%
448
 
7.0%
371
 
10.4%
2376
55.1%
144
 
6.4%

Bare Nuclei
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.54465593
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:04.394128image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.64385716
Coefficient of variation (CV)1.027986138
Kurtosis-0.7988441354
Mean3.54465593
Median Absolute Deviation (MAD)0
Skewness0.9900156547
Sum2421
Variance13.27769501
MonotonicityNot monotonic
2022-10-10T22:23:04.448177image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1402
58.9%
10132
 
19.3%
230
 
4.4%
530
 
4.4%
328
 
4.1%
821
 
3.1%
419
 
2.8%
99
 
1.3%
78
 
1.2%
64
 
0.6%
ValueCountFrequency (%)
1402
58.9%
230
 
4.4%
328
 
4.1%
419
 
2.8%
530
 
4.4%
64
 
0.6%
78
 
1.2%
821
 
3.1%
99
 
1.3%
10132
 
19.3%
ValueCountFrequency (%)
10132
 
19.3%
99
 
1.3%
821
 
3.1%
78
 
1.2%
64
 
0.6%
530
 
4.4%
419
 
2.8%
328
 
4.1%
230
 
4.4%
1402
58.9%

Bland Chromatin
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.445095168
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:04.505231image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.449696573
Coefficient of variation (CV)0.7110678959
Kurtosis0.1676456428
Mean3.445095168
Median Absolute Deviation (MAD)1
Skewness1.095270469
Sum2353
Variance6.001013297
MonotonicityNot monotonic
2022-10-10T22:23:04.559278image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
3161
23.6%
2160
23.4%
1150
22.0%
771
10.4%
439
 
5.7%
534
 
5.0%
828
 
4.1%
1020
 
2.9%
911
 
1.6%
69
 
1.3%
ValueCountFrequency (%)
1150
22.0%
2160
23.4%
3161
23.6%
439
 
5.7%
534
 
5.0%
69
 
1.3%
771
10.4%
828
 
4.1%
911
 
1.6%
1020
 
2.9%
ValueCountFrequency (%)
1020
 
2.9%
911
 
1.6%
828
 
4.1%
771
10.4%
69
 
1.3%
534
 
5.0%
439
 
5.7%
3161
23.6%
2160
23.4%
1150
22.0%

Normal Nucleoli
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.869692533
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:04.616330image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.052666407
Coefficient of variation (CV)1.063760794
Kurtosis0.4735882982
Mean2.869692533
Median Absolute Deviation (MAD)0
Skewness1.420431124
Sum1960
Variance9.318772193
MonotonicityNot monotonic
2022-10-10T22:23:04.668377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1432
63.3%
1060
 
8.8%
342
 
6.1%
236
 
5.3%
823
 
3.4%
622
 
3.2%
519
 
2.8%
418
 
2.6%
716
 
2.3%
915
 
2.2%
ValueCountFrequency (%)
1432
63.3%
236
 
5.3%
342
 
6.1%
418
 
2.6%
519
 
2.8%
622
 
3.2%
716
 
2.3%
823
 
3.4%
915
 
2.2%
1060
 
8.8%
ValueCountFrequency (%)
1060
 
8.8%
915
 
2.2%
823
 
3.4%
716
 
2.3%
622
 
3.2%
519
 
2.8%
418
 
2.6%
342
 
6.1%
236
 
5.3%
1432
63.3%

Mitoses
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.603221083
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2022-10-10T22:23:04.723427image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.732674146
Coefficient of variation (CV)1.080745609
Kurtosis12.27337364
Mean1.603221083
Median Absolute Deviation (MAD)0
Skewness3.511476241
Sum1095
Variance3.002159697
MonotonicityNot monotonic
2022-10-10T22:23:04.775476image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1563
82.4%
235
 
5.1%
333
 
4.8%
1014
 
2.0%
412
 
1.8%
79
 
1.3%
88
 
1.2%
56
 
0.9%
63
 
0.4%
ValueCountFrequency (%)
1563
82.4%
235
 
5.1%
333
 
4.8%
412
 
1.8%
56
 
0.9%
63
 
0.4%
79
 
1.3%
88
 
1.2%
1014
 
2.0%
ValueCountFrequency (%)
1014
 
2.0%
88
 
1.2%
79
 
1.3%
63
 
0.4%
56
 
0.9%
412
 
1.8%
333
 
4.8%
235
 
5.1%
1563
82.4%

Class
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
2
444 
4
239 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters683
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Length

2022-10-10T22:23:04.836530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-10T22:23:04.898586image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring characters

ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number683
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring scripts

ValueCountFrequency (%)
Common683
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII683
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Interactions

2022-10-10T22:23:02.835710image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:57.736081image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.493771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.076301image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.663836image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.248873image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.022052image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.610482image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.220146image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.897767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:57.962288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.558830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.140359image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.727894image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.313935image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.093117image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.683549image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.292212image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.967830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.026345image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.623889image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.204417image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.792953image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.377466image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.158176image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.752612image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.360273image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:03.034892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.089402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.690951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.268475image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.856010image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.610678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.225240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.819672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.432339image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:03.097949image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.155463image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.754007image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.334535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.921069image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.679741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.288294image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.885734image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.501407image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:03.329159image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.227529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.818066image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.399595image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.988635image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.742798image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.351351image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.951794image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.578476image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:03.391216image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.294590image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.883124image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.465655image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.053704image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.807857image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.414408image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.019856image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.642535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:03.454273image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.363652image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.947184image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.533717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.116753image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.870914image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.476465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.084917image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.709596image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:03.522335image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:58.430714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.013243image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:22:59.596774image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.181812image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:00.948986image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:01.539523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.151978image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-10T22:23:02.774655image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-10-10T22:23:04.959642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-10T22:23:05.075747image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-10T22:23:05.188850image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-10T22:23:05.298951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-10T22:23:03.622425image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-10T22:23:03.749541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Clump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosesClass
05111213112
154457103212
23111223112
36881343712
44113213112
58101087109714
611112103112
72121213112
82111211152
94211212112

Last rows

Clump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosesClass
6731111211182
6741113211112
675510105454414
6763111211112
6773111212122
6783111321112
6792111211112
6805101037381024
68148643410614
68248854510414

Duplicate rows

Most frequently occurring

Clump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosesClass# duplicates
3111121111227
5111121311223
4111121211221
19311121211220
18311121111212
12211121111210
20311121311210
26411121111210
27411121211210
35511121211210